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Abstract. Object recognition from sensory data involves, in part, determining the 
pose of a model with respect to a scene. A common method for finding an object's pose 
is the generalized Hough transform, which accumulates evidence for possible coordinate 
transformations in a parameter space whose axes are the quantized transformation param- 
eters. Large clusters of similar transformations in that space are taken as evidence of a 
correct match. In this article, we provide a theoretical analysis of the behavior of such 
methods. We derive bounds on the set of transformations consistent with each pairing of 
data and model features, in the presence of noise and occlusion in the image. We also 
provide bounds on the likelihood of false peaks in the parameter space, as a function of 
noise, occlusion, and tessellation effects. We argue that blithely applying such methods to 
complex recognition tasks is a risky proposition, as the probability of false positives can 
be very high. 
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1. Introduction 



The recognition of partially occluded objects from noisy data is an important com- 
ponent of many problems in vision and robotics. Recognizing an object generally 
entails finding a matching between elements of an object model and instances of 
those elements in the data, and thereby recovering a transformation that maps a 
model of the object onto a portion of an image. There are a variety of approaches 
to the problem of finding possible transformations (see [Besl and Jain 85], [Chin 
and Dyer 86] for recent surveys), a common subclass of which are based on trans- 
formation clustering. The generalized Hough transform [Ballard 81] [Davis 82], or 
related parameter hashing techniques, are often used to perform the transformation 
clustering (e.g., [Thompson and Mundy 87] [Silberberg et al. 84] [Silberberg et al. 
86] [Lamdan et al. 87] [Turney et al. 85]). 

In this paper, we consider the robustness of clustering methods based on varia- 
tions of the generalized Hough transform. We investigate the power of such methods 
to distinguish clusters that are due to a correct matching of image and model fea- 
tures from those that occur at random. We find that the methods work well as 
long as the correct match accounts for both much of the model and much of the 
sensory data. For moderate levels of sensor noise, occlusion, and image clutter, 
however, the methods can hypothesize many false solutions, and their effectiveness 
is dramatically reduced. 

The idea underlying transformation clustering methods is to accumulate inde- 
pendent pieces of evidence for a match. Each pair of model and image features (such 
as edges or vertices) defines a range of possible transformations from a model to an 
image. In the case of rigid objects, each transformation consists of a translation and 
rotation from the model coordinate system to the image coordinate system, and 
thus specifies the pose of the model with respect to the image. The uncertainty in 
the range of possible transformations depends on the type of feature, and on the 
degree of accuracy in the measurement of the features. 

Ranges of transformations consistent with a feature pair are computed for all 
pairs of model and image features. Those pairs that are part of the same correct 
match of a model to an image will result in approximately the same transforma- 
tions. Random pairs of model and image features, on the other hand, will result in 
randomly distributed transformations. Thus a cluster of similar transformations is 
assumed to correspond to a correct match. The validity of this assumption, how- 
ever, depends on there being a low likelihood that random clusters will be as large 
as those clusters resulting from correct matches. 

Two techniques are commonly used to find clusters in an n-dimensional param- 
eter space: fc-means clustering and the generalized Hough transform. These tech- 
niques both start with a set, P, of parameter vectors, or points in the w-dimensional 
parameter space, and yield a set of subsets of P, where each subset is a cluster 
of similar parameter vectors. In transformation clustering approaches to recogni- 



tion, each dimension of the parameter space, P, corresponds to a component of the 
transformation from a model to an image. 

The A;-means method is an iterative technique that starts by dividing the pa- 
rameter vectors into k groups, and then iteratively moves vectors from one group 
to another in order to minimize the total distance between elements in each group. 
The fc-means clustering algorithm requires a distance metric to be defined for com- 
paring any two parameter vectors. In the case of transformations from a model to 
an image it is difficult to define an appropriate distance metric, because the pa- 
rameter space consists of both translations and rotations, which are not directly 
comparable. A further limitation of the approach is that some pre-defined number 
of clusters, k, must be used. Thus the system must have a reasonable guess of 
how many meaningful clusters there are (i.e. how many object instances are in an 
image). 

Rather than using the fc-means method, object recognition systems tend to clus- 
ter transformations using the generalized Hough transform. The Hough technique 
works by quantizing the parameter space into discrete n-dimensional buckets. Each 
parameter vector is entered into a bucket by quantizing its n parameter values and 
using them as indices into an n-dimensional table. The quantization will generally 
map similar parameter vectors into the same bucket. Hence, the search for large 
clusters of similar transformations simply requires examining each bucket to find 
those buckets with the most entries. 

The remainder of this paper considers the effectiveness of using the generalized 
Hough transform to find clusters of similar transformations in order to match a 
model to an image. Three central questions are addressed in this investigation: 

1. What is the range of transformations specified by a given pairing of model and 
image features? 

2. How many Hough buckets are specified by such a range of transformations? 

3. How many model-image pairings are likely to fall into the same Hough bucket 
at random? 

The first two questions are considered in Section 3, which analyzes the amount of 
uncertainty involved in computing a two-dimensional transformation from a model 
to an image, using either pairs of straight edge fragments or pairs of vertices. In 
Section 4, the generalized Hough transform is modeled as an occupancy problem, in 
order to estimate the size clusters that are likely to occur at random. This analysis 
makes use of the analytic results from Section 3, as well as some empirical data 
from existing recognition systems. We find that for a wide variety of tasks, clusters 
occurring at random are as large in size as those that are due to a correct match. 
Thus for these tasks, the generalized Hough method is not a good technique for 
finding correct matches of a model to an image. 

A number of other authors have considered aspects of the noise sensitivity of 
the Hough transform, usually in the case of detecting lines or other simple curves in 
noisy images [Shapiro 75, Maitre 76, Cohen and Toussaint 77, Shapiro and Iannino 



79, Alagar and Thiel 81, van Veen and Groen 81]. Brown [1983] has considered the 
noise properties of more general applications of the Hough transform, by treating 
the problem as one of signal processing. In this article, we take a different approach, 
using discrete combinatorial tools to analyze the problem. 

Before addressing the three questions posed above, the next section considers 
the generalized Hough transform in more detail. Some of the limitations of the 
simplest formulation of the technique are considered, along with the methods that 
are generally used to overcome those limitations. Unfortunately, these methods turn 
out to the increase the likelihood that many model-data pairings will fall into the 
same Hough bucket at random. 



2. Parameter hashing: the generalized Hough transform 

The generalized Hough transform finds possible solutions to the object pose problem 
by searching for large clusters of evidence in a discrete version of a parameter space. 
A parameter vector, p, represents a point in an n-dimensional space, V. Each point 
in V maps to a point in the n-dimensional discrete Hough space, H, that is specified 
by quantizing each of the n components of p. The Hough transform method is often 
also referred to as parameter hashing, because each quantized parameter value is 
a hash key. Implementations of the Hough method generally use an n-dimensional 
table to represent H, and refer to the entries in the table as buckets. 

When the generalized Hough method is used for transformation clustering, each 
dimension of the parameter space, V, corresponds to a component of the transforma- 
tion from a model to an image. If the coordinate system of the image measurements 
is denoted by J, and the model coordinate system is denoted by M, then V is the 
space of mappings from M. to I. 

For each pair of model and image features, the range of possible transformations 
is computed. This set of transformations defines a region, T C V. The quantized 
values of this n-dimensional volume, T, are used to enter the model-image pairing 
into all the buckets in H that intersect the range of possible transformations. Those 
model-image pairings that fall into the same quantization bucket define a cluster of 
similar transformations. It is assumed that the large clusters will identify correct 
transformations from a model to an image. Thus recognition consists of searching 
the n-dimensional discrete table (the space H) for those buckets with a large number 
of entries. 

As an example, suppose that a model consists of linear segments, and the 
sensory data has been processed to produce comparable linear segments. Suppose 
there are m different model fragments, and s sensory fragments. Each sensory 
measurement taken from 1 is matched in turn with each model fragment, for a total 
of ms model-data pairings. Consider the pairing of data edge j with model edge 



J . We can compute the transformation required to bring the model fragment into 
correspondence with the data fragment. In two dimensions, this transformation 
can be defined as the angle of rotation Ojj needed to align the tangents of the two 
fragments, and the two dimensional translation tjj needed to then align the rotated 
model edge with the data edge. 

In the case of no uncertainty, (Ojj, tjj) exactly defines the transformation as- 
sociated with the data-model pair, jj. This transformation (Ojj, tjj) is represented 
by a point in the three-dimensional transform space V. If there is sensor error or 
partial occlusion, then the pairing jj defines a range of a possible transformations, 
represented by a volume in V. The corresponding parameters Ojj and tjj are quan- 
tized, and used to enter the pairing j J into those buckets of the three-dimensional 
Hough table that intersect the volume in V. 

There are three problems with the generalized Hough method as presented: 

1. Similar parameter vectors will end up in different buckets if they are on different 
sides of a quantization boundary. This problem is exacerbated by uncertainty in 
the parameter values. 

2. For high dimensionality parameter spaces, the table can get very large, making 
the search for large clusters cumbersome. 

3. The likelihood of large clusters occurring at random can be quite high, because 
the quantization integrates noise by collecting together all the random events 
within a bucket. The likelihood depends on the ratio of the number of parameter 
vectors to the number of buckets. 

Two methods are often used to ensure that similar parameter vectors end up in the 
same cluster. The first method computes clusters over a local k n neighborhood of 
buckets in the Hough table, rather than a single bucket. Generally a 3 n neighbor- 
hood is used, so that any transformations that are within one bucket of each other, 
along any dimension, will be clustered together. The second method computes the 
range of possible buckets that each data-model pairing could fall in, and enters it 
into each of these buckets. Both methods have the effect of increasing the number 
of parameter vectors entered into the table, thereby increasing the likelihood that 
large clusters will occur at random. 

Reducing the size of the table, so that search space is of a tractable size, in- 
creases the likelihood of large clusters occurring at random. The fewer buckets there 
are, the more likely that many parameter vectors will fall into the same bucket at 
random. Most systems that use the generalized Hough technique for clustering in 
high dimensional parameter spaces (such as six degree of freedom three-dimensional 
recognition tasks) use only a subset of the parameters to define the Hough table. 
This greatly reduces the size of the table, but at the same time greatly increases the 
chance of large random clusters. 

Thus the techniques used to address the first two problems exacerbate the third 
problem. It is this problem that we analyze using a combinatoric model in Section 



4. First, however, we derive bounds on the number of Hough buckets specified by 
each pairing of model and image data elements. 



3. Two dimensional noise analysis 

This section addresses the issue of what we will term the redundancy factor of 
entering transformations into the table. That is, how many different buckets in 
the Hough table can the same model-data pairing specify? This depends on the 
dimensionality of the sensory data, the dimensionality of the transformation from a 
model to an image, the coarseness of the tessellation of the Hough space, and the 
expected amount of noise in the sensory measurements. 

To determine the redundancy factor we need a method for estimating the set 
of transformations consistent with a data-model pairing, under different classes of 
allowed transformations. We begin with rigid two dimensional problems, using linear 
edge fragments. Details of the development are deferred to an appendix. Note that 
this is a specific case of using the generalized Hough transform. We will extend 
the arguments in later sections to deal with three-dimensional problems and to deal 
with problems involving change of scale. 

3.1 Rigid transformations 

Suppose we are considering the recognition of a two-dimensional polygonal model 
from noisy, occluded data. If M is the model coordinate system, we let 

Mj be the vector to the midpoint of a model edge, measured in M, 

Tj be the unit tangent of the edge, measured in M, 

L j be the length of the edge. 

We let mj,tj,tj denote similar parameters for a data edge, measured in the sensor 
based coordinate system, X. (Note that we use upper case characters to distinguish 
model parameters and lower case characters to distinguish sensory data parameters.) 

The transformation from model coordinates to sensor coordinates may be rep- 
resented by 

v s = R 8 Vm + V 
where Vjvf is a vector in model coordinates, Rg is a rotation matrix corresponding 
to an angle of 0, Vo is a translation offset, and v s is the corresponding vector in 
sensor coordinates. 

We need to know what transformations will map a model edge to a data edge. 
First, if tj > Lj, we assume that the two edges cannot match (we consider the 
case of variable scale in the next section). Thus, suppose that lj < Lj. Then the 
rotation needed to align the two tangents is given by the angle 6 m between Tj and 



tj, and this defines a rotation matrix R$ m . If we apply this rotation to the set of 
edge points 

we get a set of transformed points 



{ 



R*. 



Mj + aTj 






To align the edges, we need to translate these rotated points. Now, because 
f-i < Lj, there are many transformations that will cause the edges to overlap. 
Consider one endpoint of the data edge 

It* 
Pi = m i - ytj- 

If this happens to coincide with a model edge endpoint, 

Pi = Mj - ^-Tj 



then 



™j - fh = R e „ 



L 7 a 



+ V 



so that the translation is 



V = mj - R 8m Mj + y-^LR tm Tj 
because iE« m Tj = tj. Similarly, if the other endpoints align, we get 

V = mj - Re m Mj - ±±zJ±R 6m Tj. 

Because any intermediate position is also acceptable, the set of translations consis 
tent with matching model edge J to data edge j is given by 

■Lj — "j Li j — tj 



| m j- 



Re m Mj + aR 9m Tj 



a€ 



']}■ 



(1) 



Hence, matching model edge J to data edge j yields a set of points in transform 
space V, with a single value for the rotation parameter and a set of values for the 
translation, that correspond to a line of length Lj — tj, with orientation ^ ra Tj in 
the x-y plane. 

This, however, ignores the issue of noise in the measurements. In practice, we 
may only know the position of the endpoints of the data edge to within some ball 
(which in two dimensions is just a circle) of radius e p , and the orientation to within 
an angular error of e a . For the case of two dimensional lines, these error ranges 
are related. Given endpoint variations of e p , it is straightforward to show that the 
maximum angular variation is when the correct line is tangent to both circles of 
radius e p about the two endpoints, and is given by 



e a = tan 



-l 



2e r 



V^ 1 



4el 



provided £ > 2e p . 

Inclusion of error effects on position measurements imply that the line of feasible 
translations, for a given rotation, (as given by equation (1)), must be expanded to 
include any points in the parameter space within e p of that line. Further, this 
expansion into a region must be repeated for each value of 9 in [9 m — c a ,9 m + £a]- 
Note that this carves out a skewed volume in Hough or transform space, because the 
region's center and orientation are functions of 9 (see equation (1)). This observation 
has been carefully analyzed in [Clemens 86]. 

Thus, given Mj,Tj, Lj,mj,ij,£j, we will use the following conditions: 

• If £j — 2e p > Lj, then there are no consistent transformations, 

• Otherwise, the set of feasible transformations is denote by the volume 

V(j,J)= (J S(0,j,J) 

where an individual set of translations is denoted by: 



S(9,j,J)={(9,V Q ) 



3«, M < L \ lj , IK - R e Mj + aTj - VoH < e p 



These conditions imply that if a model-data pair of edges satisfy the unary 
constraint of length agreement, then there is a set of transforms that must all be 
considered as consistent. 

We can already use these results to estimate the size of the set of feasible 
transformations. Some simple manipulations indicate that the volume of the region 
defined above is given by 

2e a [2e p (Lj - £j) + ne 2 p ] . 

Of more interest is the number of buckets in the Hough space that are consistent 
with such volumes. 

If the Hough space H were continuous, and hence identical to the transform 
space V, then we would simply need to compute all such volumes, over all data- model 
pairings, and let f(6,t) denote the number of volumes that contain the point (0, t). 
Then the correct interpretation would be the point at which / attains a maximum. 
However, in real systems, one usually tessellates the transform space V into non- 
infinitesimal buckets to obtain the Hough space H. We let the dimensions of the 
Hough buckets be he along the rotation axis, and h t along each of the translation 
axes. Thus, we really want to determine the number of buckets that intersect one 
of these volumes, as that will determine the redundancy of the hashing scheme. 

We begin by considering the plane of buckets consistent with a rotation value 
of 9 C . Suppose we let B(9,j,J) denote the set of buckets in this plane that intersect 
the slice S(9,j,J). As 9 varies from 9 C to 9 C + h$, the slice S(9,j,J) changes, and 
hence the set B(9,j, J) may also change. To determine the entire set of buckets 

U B(0,j,J) 

0€[0c,0c+M 




Figure 1. Region of feasible translations. The outlined area denotes the set of translations 
that are consistent with a data-model pairing, as the orientation ranges over the size of a 
Hough bucket. Details of the development are given in the appendix. 

we can first project each slice in x-y, S(0,j,J) onto the x-y plane, and then find 
the number of buckets that intersect the union of these projections. 

The set of feasible translations under this projection is shown in Figure 1. In 
the appendix, we show that a lower bound on the expected redundancy factor for 
pose clustering, b, i.e. the number of buckets into which a single data-model pairing 
casts a vote is given by 

b> |"S2. \A* + (h e ,e;,M*,L*,/3)} (2a) 



where the bound on angular error is given by 



e a = tan 



-l 



2e! 



y/(£*y-4(e;)\ 



(26) 



and where the modified area is given by 

A\(h 9 , e;,M*,L*,/3) > 2M*(1 - f3)L*^- + tz (e* p ) 2 + 2e p L*(l - f3) + 2e* p h e M* 

+ —= (2M*h $ + 2(1-(3)L* + 2^;) . (2c) 

Note that this expression depends on the distance of the midpoint of the model 
edge from the center of the coordinate system, M, on the model edge length L, 



on the length of the data edge, which we have assumed to be a fraction 0L of the 
model edge length, on the bound in position uncertainty, e p , and on the size of the 
rotational dimension of the Hough bucket. The expressions also depend on the size 
of the translation dimension of the Hough bucket, which we have normalized for, 
using 

p h t h t h t 

We have omitted the subscripts in the above expression, in an attempt to maintain 
readability of the expression. 



3.1.1 Examples 

To demonstrate the effect of this redundancy, we consider some representative exam- 
ples. For simplicity, we will consider an object with equal length sides (Lj = L = 50 
pixels VJ) and with constant offset of the midpoint of each edge from the centroid 
of the object, (Mj = M = 100 pixels VJ). We will assume that the size of the 
image is 500 pixels on a side. We consider two different tessellations of the Hough 
space, h t = 5, hg = tt/36 and h t = 25, hg = 57r/36. For each of these, we consider 
three different error bounds on the sensory data, e p = 2.5, 5 and 10 pixels. We also 
consider three different levels of fragmentation of the data edges, that is, the fraction 
of the model edge actual obtained in the image as a data edge. This is given by 
setting = 2e p /L, .5, 1.0, corresponding to the smallest allowed size, to half the size 
of the model edge, and to the case of no occlusion of the edges. Recall that refers 
to the ratio of the length of the deta edge to the length of the model, and reflects 
the amount of occlusion present in an individual edge. Tables 1 and 2 summarize 
the redundancy b for each of these case, shown both in terms of the actual number 
of buckets, and as a fraction of the total buckets in the tessellated space, using the 
bounds of equation (2). 
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Table 1. Redundancy of Hough hashing, for tessellations of ht = 5 and hg = x/36. The 
lower bound on actual number of buckets hashed, and the fraction of the total number of 
buckets is given, for a single data-model pairing. The total number of buckets in this case 
is 720, 000. 

The redundancies reported above apply to a single data-model pairing, and the 
examples reported in Tables 1 and 2 use particular values of the length of the model 
edge, and its offset from the origin of the model coordinate system. Very similar 
redundancies hold for other values of these parameters, however. In Table la, we 
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Table 2. Redundancy of Hough hashing, for tessellations of ht = 25 and hg = 5jt/36. The 
lower bound on actual number of buckets hashed, and the fraction of the total number of 
buckets is given, for a single data-model pairing. The total number of buckets in this case 
is 5, 600. 



show the redundancies obtained for fixed values of error, e p = 5, and a fixed bucket 
size, h t = 5, h e — 7r/36, but with varying edge length L and varying model offset M. 
One can see that considerable variation in these values yields similar redundancies. 
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Table la. Redundancy of Hough hashing, for tessellations of ht = 5 and hg = 7r/36. The 
lower bound on actual number of buckets hashed is given for a single data-model pairing. 
The error is fixed at e p = 5, occlusion is fixed at /3 = .5 and the length and offset of the 
model edges are varied. The total number of buckets in this case is 720, 000. 

The data in Tables 1 and 2 deal with extended edge fragments. If the data is point 
data, for example, vertices, then (3 = 1 and L = 1. In this case, we need some other 
means of estimating the orientation, and for illustrative purposes we use e a = w/36. 
This is a tighter bound than that used in the previous examples. The redundancy 
for the two different tessellations of the Hough space, and for the different positional 
error bounds are shown in Table 3. 
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Table 3. Redundancy of Hough hashing, for point data. The error in measuring the normal 
is assumed to be e a = x/36. The lower bound on actual number of buckets hashed, and 
the fraction of the total number of buckets is given, for a single data-model pairing. The 
number of buckets is 720,000 for the left part of the table, and 5,600 for the right. 

All of the above examples involve the use of a full three-parameter Hough space. In 
many cases, it is common to use the projection of that space onto a smaller subspace, 
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Table 6. Redundancy of Hough hashing, for point data, using projection of the full space 
onto the two dimensional translation subspace. The error in measuring the normal is 
assumed to be e a = x/36. The lower bound on actual number of buckets hashed, and 
the fraction of the total number of buckets is given, for a single data-model pairing. The 
number of buckets is 10, 000 for the left part of the table, and 400 for the right. 

typically using the projection onto the translational subspace. We can also derive 
estimates of the redundancy of this method. We can use the same equations as 
before, with some minor changes. First, the swept area of the translational subspace 
is given by considering the full range of rotational values, 2e in place of h$. Second, 
the redundancy factor is obtained by considering only the translational subspace, 
and is given by 

b> \A* + (2e a ,e;,M*,L*,(3)}. (3) 

Examples of the redundancy, using equation (3) are shown in Tables 4, 5, and 6. 
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Table 4. Redundancy of Hough hashing, for tessellations of ht = 5 and hg = 7r/36, using 
projection of the full space onto the two dimensional translation subspace. The lower bound 
on actual number of buckets hashed, and the fraction of the total number of buckets is 
given, for a single data-model pairing. The total number of buckets in this case is 10, 000. 
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Table 5. Redundancy of Hough hashing, for tessellations of ht = 25 and he = 5?r/36, 
using projection of the full space onto the two dimensional translation subspace. The 
lower bound on actual number of buckets hashed, and the fraction of the total number of 
buckets is given, for a single data-model pairing. The total number of buckets in this case 
is 400. 
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Several observations are in order. First, the tables show that in general, the redun- 
dancy of Hough hashing can be quite large, both in terms of the number of buckets 
consistent with a single data- model pairing, and in terms of the fraction of the Hough 
space deemed consistent with such a pairing. As expected, when one considers the 
matching of vertices to vertices in place of matching edges to edges, the redundancy 
improves. This is to be expected, since in the edge case, a partial edge can slide 
along its corresponding model edge, leading to more feasible transformations. 

As well, when the sensor error is reduced, the redundancy improves. Increasing 
the coarseness of the Hough tessellation can reduce the total number of buckets into 
which a data- model pair votes, but in general this increases the fraction of the total 
number of buckets selected. In general, the analysis and examples argue that the 
redundancy of Houghing can be quite high. 

While we have provided examples of several levels of sensor error, we note that 
the higher levels of error are probably more indicative of the situation encountered 
with real images. Several factors will contribute to the bound for e p . First, aberra- 
tions in the optics will cause the recorded edges to deviate from the actual physical 
edge. Second, smoothing effects in the edge detector will add to the displacement of 
recorded edges. The amount of deviation will depend on the specifics of the opera- 
tor, but 1 or 2 pixel errors are likely to be common. Third, using a split-and-merge 
operation to extract linear segments from grey level edges will further add to the 
error, typically by several pixels, so that overall error bounds of at least 5 pixels are 
to be expected. 

3.2 Scaled transformations 

Suppose we now allow the objects to scale, as well as rotate and translate. In this 
case the transformation from model to sensor coordinates is given by 

v 3 = kR e V M + V 
where Vm is a vector in model coordinates, R$ is a rotation matrix corresponding to 
an angle of 0, Vo is a translation offset, A; is a scale factor and v s is the corresponding 
vector in sensor coordinates. 

In this case, the set of feasible translations corresponding to a data-model pair- 
ing is a function of both the scale and the rotation: 

/tX/jr — Z-j K Ju j — £j 



l mj -kRe m Mj + aR em fj 


oe 


this case, the scale has a minimum bound of 


k2 


- Lj' 



*]} 



To determine the redundancy factor for parameter hashing in the case of scale, 
we again want to determine the number of buckets consistent with a data-model 
pairing for a single slice of the x-y components of the transform space. Note that 
in this case, the transform space T is four dimensional, with an extra axis for the 
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scale factor. Projecting the volume obtained as varies over the bounds of a single 
bucket gives us the volume shown in Figure 2, where now the borders are functions 
of the scale factor. If we now look at the projection of the volume as k is varied, 
we will get the region obtained by varying the region in Figure 5 over the range of 
values of k. This new region is shown in Figure 3. 




Figure 2. Rotation of the line of feasible translations through hg radians. 



Using an analysis similar to the previous case (details are given in the appendix), we 
can derive bounds on the redundancy in the case of objects that can scale. Suppose 
we define the full range of possible scale factors to be [1, k max ], so that the model is 
defined as the smallest possible instance of an object. Then to count the redundancy 
factor in this case, we must sum the number of buckets obtained over all possible 
scale factors. If the spacing of the Hough buckets in the scale dimension is h k , then 
this sum is given by: 



b s > 
where 



2e a 



I* 
A* i+ (i s - max{l, JT^-}^^ h k) 



*mtpf 



+ £ 



2e a 
he 



\A* s<+ (h k ,ih k )] (5) 



i s = 



max(l, £■) 
h k 
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Figure 3. Region of translation space consistent with scale variation and angle variation, 
and where 



2V2 



>; >h e ((fc fc + k e )M*j - ^Az}) + {k h + k e )L*j - 21) + 2(k h - k e )M} + 27re; 



A* a (Ak, k h ) =h e k h M} (k h L} - £*j) 



+ h e Ak 



h - 



Ak 



w? + <2£) _ m 



+ — M*j \{2k\ + 2k h Ak + Ak 2 )L* - (2k h + Ak)£*] 



Tt 



+ M;A*[(* fc -^)xj-| 



+ heel 



Ak 
(2k h -Ak)M*j-—L*j 



+ 



2e* p (2 (k h -^f)L*j- 2/j) 



+ 2€;AkM* J + K(e;)\ 

The final, rather messy, expression is a function of the range of variation in scale 
Ak as well as the maximum value of the scale parameter kh- 

We can use this to generate example redundancies. Tables 7 and 8 show the 
redundancy, for the case of M = 100, L = 50 using a fine tessellation of h t = 5, h$ = 
3£ and using 100 buckets in the scale dimension. We consider both the case of 
kmax = 2 and k max = 10 (Tables 7 and 8 respectively). 
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1.5 


e p = 2.5 


36320 


.00050 


12444 


.00017 


2742 


.00004 


e p = 5 


99190 


.00138 


28925 


.00040 


7872 


.00011 


e p = 10 


345752 


.00480 


95500 


.00133 


23863 


.00033 



Table 7. Redundancy of Hough hashing, including a scale dimension with range from 1 to 2 
in increments of .01. Tessellations are ht = 5 and hg = x/36. The lower bound on expected 
number of buckets hashed, and the fraction of the total number of buckets is given, for a 
single data-model pairing. The total number of buckets in this case is 72, 000, 000. 
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e p = 2.5 


518455 


.00720 


286998 


.00399 


37887 


.00053 


€ p = 5 


1153870 


.01603 


531745 


.00739 


41820 


.00058 


e P = 10 


3063967 


.04256 


1281950 


.01780 


99908 


.00139 



Table 8. Redundancy of Hough hashing, including a scale dimension with range from 1 
to 10 in increments of .09. Tessellations are ht = 5 and hg — x/36. The lower bound on 
the expected number of buckets hashed, and the fraction of the total number of buckets 
is given, for a single data-model pairing. The total number of buckets in this case is 
72,000,000. 

We can also do the parameter hashing by projecting onto a subspace of the 
full space. In the case of allowing scale to vary, for instance, we can consider 
the projection of the 4D volume into the normal 3D space spanned by the two 
translational and one rotational dimensions. The data for the cases of Tables 7 and 
8 under this projection are given in Tables 9 and 10. 
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.5 


P 
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e p = 2.5 


1860 


.00258 


897 


.00125 


310 


.00043 


€p = 5 


4180 


.00581 


1675 


.00233 


704 


.00098 


€ p = 10 


11308 


.01571 


4120 


.00572 


1568 


.00218 



Table 9. Redundancy of Hough hashing, including a scale dimension with range from 1 to 
2, projected onto the normal 3D space. Tessellations are ht = 5 and hg = x/36. The lower 
bound on the expected number of buckets hashed, and the fraction of the total number of 
buckets is given, for a single data-model pairing. The total number of buckets in this case 
is 720, 000. 



All of the examples of this section argue strongly that the redundancy of Hough 
transforms, in the presence of sensor error and partial occlusion of data elements, 
is quite high. In particular, the number of buckets in the Hough space that are 
consistent with a data-model pairing can be a significant portion of the total Hough 
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59130 


.08213 


34113 


.04738 


6393 


.00888 


e p = 5 


121180 
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6558 


.00911 


e p = 10 


279488 


.38818 


122190 


.16971 


13788 


.01915 



Table 10. Redundancy of Hough hashing, including a scale dimension with range from 
1 to 10, projected onto the normal 3D space. Tessellations are ht = 5 and h^ = x/36. 
The lower bound on the expected number of buckets hashed, and the fraction of the total 
number of buckets is given, for a single data-model pairing. The total number of buckets 
in this case is 720,000. 

space. This relative redundancy increases with increasing error, with occlusion of 
data edges, when scaling is included as a free parameter, and when projections of 
the full parameter space onto subspaces is used. When point data are used with 
minimal error, the redundancy of the Hough technique is more reasonable, but in 
general cases, the method has severe redundancy problems. 



3.2 Three dimensional problems 

We can also extend our method of analysis to three dimensional problems. In this 
case, we assume that we are matching planar patches of 3D data, together with an 
estimate of the surface normal of the patch, against comparable planar model faces. 
For ease of analysis, we will assume circular patches. As in the 2D case, we need to 
determine the volume in transform space consistent with a pairing of a data patch 
and a model face, and then determine the number of Hough buckets intersected by 
the volume. 

To represent the transform space, we use: 

• a cubic cell tesselation of the subset of H 3 defining legitimate translations of 
the mode. Each bucket has sides of size ht. 

• a partition of the surface of the Gaussian sphere, used to denote the axis of 
rotation of the model. Each section has an area of h r . 

• a partition of the range [0,2tt) for the angle of rotation about the axis given 
above. Each section has a size of h$ . 

Now, we first consider the rotation part of the transform. Given a model normal 
N and a measured data normal n, there is a set of rotation vectors, and associated 
angles, that will cause N to rotate into n. This set of rotation vectors {r} consists 
of those unit vectors lying on the great circle of points on the Gaussian sphere, 
equidistant from n and N. Equivalently, they are the set of unit vectors r such that 

< r,N-n>=0 

where the special case of N = n is treated separately. 

Now, the data normal n is not exact, but deviates from the correct normal by 
some error. We assume that n lies within a bounded range of the actual normal iio, 
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given by 

< n,n >> cose a . 

We need to estimate the set of feasible rotation vectors r as n varies over the e a -cone 
about no. That set is given by the region swept out on the Gaussian sphere by the 
great circle perpendicular to the unit vector m in the direction of N — n as n varies 
over the range defined by 

< n,iio >> cose a . 

To see what this looks like, consider the case in which no = — N. Then m must 
lie in an ^"--cone about N (since N — no = 2N and n varies within a cone spanned 
by e a , the intersection of this cone with the Gaussian sphere gives an ^- cone). 
This means that the perpendicular great circle sweeps out a band about the great 
circle perpendicular to N, with a maximum deviation of ^- on either side. We can 
straightforwardly evaluate the area swept out, and it is given by 

A • ^ 

47rsm — . 
2 

Now consider what happens as no varies from the special case of fto = — N. We 

let a denote the angle between N and no. First, the length of the vector N — no 

decreases to 

. . a 
2 sin — . 
2 

Second, the e a -cone about iio now becomes a skewed cone about N — xio- We can get 

a lower bound on the size of the largest regular cone contained within this skewed 

cone. The geometry is shown in Figure4. 




Figure 4. Geometry for determining the cone of possible vectors N - n for < n, no > > cos e. 



To determine the scope of this new cone, we need to solve for y, as shown in the 
figure. Appropriate trigonometry yields 

sine tan f- 
tan y = — 



2 tan %■ + sin e 
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As N — n varies over this cone, the great circle perpendicular to it will sweep out an 
area on the surface of the Gaussian sphere, and simple integration shows that this 
area is given by 

We need to obtain a bound for this area. This expression is minimized for a = 0, 
but this corresponds to the special case in which N is a fixed point of the rotation. 
In this case, while there is no uncertainty in the axis of rotation, there is complete 
uncertainty in the angle of rotation, and hence in this case, such a pairing would 
intersect 

— 

he 

buckets in the rotation part of the transform space. In general, however, the surface 
normal will not be a fixed point of the rotation. In this case, (which we treat as 
a > € to handle the noise in the system), the rotation angle is uniquely determined, 
but the axis of rotation is uncertain. The minimum uncertainty is given by a = e a , 
and the minimum area swept out on the Gaussian sphere is bounded below by 

47T 

\A + (;ib+uh0 2 - 

Hence, given that each bucket in the Hough space has an area on the Gaussian 
sphere of h r , a pairing of a model and data patch intersects at least 

4?r 1 



1 + 






tan §■ 



\2/l r ' 



Next, we consider the translation component of the transform. Suppose we 
have a model patch of radius R and a data patch of radius r. Once we have rotated 
the model, we can slide the transformed model patch so that it contains the data 
patch. There are a set of possible translations consistent with this, and they are 
delimited by a circle of radius R — r in some slice of the translation components of 
the transform space. When we include the effects of positional error (e p ), we get a 
disk of radius R — r + e p and height e p , so that the volume of consistent translations 
is 

tt(# - r + e p ) 2 2e p 
and hence such a volume intersects at least 

2tt€ p (R - r + e p ) 2 

A? 
buckets. Thus, by putting all of this together, we see that the redundancy factor in 
the 3D case is bounded by: 



b> 



4w 



1 



1+ (^-+^r) 
ysin e ' tan 5 ) 



2 h r 



2ire p (R -r + e p ) 2 
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As an example, we consider the case in which e a 
5, the results of which are shown in Table 11. 



w/10,e p = 5,h e = Air/ 100, ht = 





0=.2 


.5 


1.0 


R=20 


888 


456 


56 


=50 


4072 


1816 


56 


=100 


14528 


6088 


56 



Table 11. Redundancy of Hough hashing, for three dimensional problems. Tesselations are 
ht = 5 and h r — 4x/200. The lower bound on actual number of buckets is given, for a 
single data-model pairing. The parameters varied are the amount of occlusion p and the 
size of the model face R. 

Note that the bounds derived here are quite weak. We could obtain much tighter 
bounds, but feel that these suffice to demonstrate that the same problems observed 
in two dimensions also hold in three. 



4. An Occupancy Model of the Hough Transform 



The previous section has addressed the issue of the number of Hough buckets that 
are consistent with a pairing of a sensory feature and a model feature. The second 
question to be addressed in considering the efficacy of the Hough transform for find- 
ing solutions to the recognition problem, is the likelihood of large random clusters 
occurring at random in Hough space. 

Recall that the recognition problem, when using Hough transforms, is to use all 
pairings of model and image features to compute transformations from the model to 
the image. Each parameter of a given transformation is quantized, and the transfor- 
mation is entered into the appropriate buckets of an n-dimensional table. Buckets 
containing a large number of transformations (a peak) are taken to correspond to 
an instance of the object in the image. Significantly large clusters are either identi- 
fied by a threshold on the number of transformations in a bucket, or by using the 
largest few buckets. In either case, the size of peak, /, that corresponds to a correct 
match of the model to the image should be large enough that it is not likely to occur 
at random. Note that / will be at most some fraction of m, corresponding to the 
fraction of the model features that are matched to image features. 

In this section, we consider the robustness of this approach, given the bounds de- 
rived in the previous section on the number of buckets for which a single data-model 
pairing may vote. We model the generalized Hough transform as an occupancy prob- 
lem, in order to obtain an estimate of the probability that a Hough bucket will have 
peaks of size / or more at random. This probability should be very small in order for 



20 

the technique to identify primarily true instances of an object in an image, rather 
than random groupings of features. 

If the transformations from a model to an image were uniformly randomly dis- 
tributed over the parameter space, then the probability that a given transformation 
would fall into a particular bucket would be ^, where n is the number of buck- 
ets. If each instance was independent of the other instances, the probability that r 
transformations would fall into a given bucket is n~ r . To the extent that transfor- 
mations are not uniformly and independently distributed, they will tend to clump 
together more than indicated by this model. Thus modeling the transformations as 
uniformly randomly distributed yields a conservative model of the actual distribu- 
tion. The true distribution will yield random peaks that are at least as large as the 
uniform case. 

Given a distribution of r events into n cells, one can speak of the occupancy 
numbers, or the number of events in each cell, denoted by rx,...,r n , where each 
7-j > and Y^ r i = r - K the events are randomly distributed such that each of 
the n r placements have the equal probability, n -r , then the probability of a given 
arrangement with occupancy numbers r\ , . . . , r n is 

r\ 

'• — r 

Pri,...,r n — , . Z~l n 

niT?'. •• -r n l 
This distribution of events is often termed the classical occupancy problem, or 
Maxwell-Boltzmann statistics (for a standard text see [Feller 68]). 

For the classical occupancy problem, the probability, pk, that a given cell con- 
tains exactly k events is given by the binomial distribution, 

r—k 
I r \ l i _ i \ 

Pk 



( r )-(i--)' 

\kj n k \ nj 



We are interested in the probability that a given cell will contain / or more events 

at random, which is 

i-i 

P>i = l-2 Pfc - 

fc=0 

The expected number of cells in a Hough table that will contain peaks of size at 

least / is then given by 

E>i — np>t 

where n is the number of cells in the table. Ideally, the peaks corresponding to 

correct matches should be of a sufficient size, /, that E>i < 1. In other words, 

ideally the expectation should be that there will be less than one false peak in the 

table. 

For even moderate values of n and r, the computation of py. becomes unwieldy. 

For sufficiently large values of n, however, the Poisson approximation to the binomial 

can be used. The error of this approximation is proportional to n -1 , so for n's of 

the size discussed in the previous subsection (10 4 or larger) the error is relatively 

small. Using this approximation, 

X k 
Pk « -j^e \ 
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where A = £. Thus the parameter A is the ratio of the number of elements entered 
into the table, over the number of buckets. 

In addition to the Maxwell-Boltzmann distribution, another common distribu- 
tion used in occupancy problems is the Bose-Einstein statistic. This distribution has 
an experimental basis in particle physics, and assigns an equal probability to each 
of the occupancy numbers, ri,...,r n . Under the Bose-Einstein model, for large r 
and n, the limiting case is the so-called geometric distribution, where 



Pk 



(1 + A)*+! * 

This distribution has a long tail as k — ► oo, and thus predicts large peaks with a 
higher probability than does the Maxwell-Boltzmann model. Hence we use the more 
conservative model given by the Maxwell-Boltzmann distribution. 

4.1 Evaluating the Generalized Hough Transform 

To judge the effectiveness of the generalized Hough transform as a clustering tech- 
nique, the occupancy model will be used on some representative problems. First 
we will use the redundancy factors obtained in Section 3 to consider some two- 
dimensional recognition problems. Then we will examine some empirical data from 
a three-dimensional recognition system. 

The A parameter of the occupancy model is the ratio of the number of events 
entered into the table to the number of buckets. The number of events, r = msb, 
where m is the number of model features, s is the number of sensory features, and 
b is the redundancy factor. Thus A = msb/n, where n is the number of buckets in 
the table. 

We are interested in the likelihood of random peaks that are at least as large as 
those due to a correct match, where / is the size peak that is expected to result from 
a correct match. A match that correctly pairs all the model with image features 
will result in a peak of size / = to. Thus in general / = /to, where < / < 1 
is the proportion of model features that are correctly matched to image features. 
For a given problem, the values of 6 and n are fixed, and we will vary to and s to 
determine how many peaks of size / will occur at random, for / = .5m, I = .75to, 
and / = .9to. 

First we consider the case of using just the two translation parameters to enter 
transformations into the Hough table. With 5 pixel buckets there are a total of 
n — 10, 000 buckets. If the features are edges, then each pair of model and image 
features defines a range of transformations that intersect b = 116 buckets (with an 
error range of e p = 5 pixels and a fragmentation of = .5, as shown in Table 4). 
In this case, the generalized Hough technique is very poor at finding clusters that 
are due to a correct match. If there are more than 47 sensory data points, then the 
expected number of peaks of size / occurring at random will be always be larger than 
1, for any value of / < to. In other words, there will always be false matches if there 
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are more than 47 features in the image. Not only will there be more than one large 
peak at random, there will generally be many large peaks. For example, with 10 
model edges and 100 image edges the expectation is that 7209 of the 10,000 buckets 
will contain peaks of size 10 or more. Thus a two-dimensional translation Hough 
table is not well suited to the problem of clustering transformations by matching 
edges, even for uncluttered images with moderate error and occlusion. 

When the features consist of vertices rather than edges, the corresponding re- 
dundancy factor, 6, is 15 (for an error range of e p = 5, as shown in Table 6). The 
expected number of peaks that will occur at random are shown in Table 12, for 
peak sizes of I = .5m, / = .75m, and / = .9m. Cases where the expectation is less 
than 1 are indicated by a dash. Even though the number of redundant entries in 
the Hough table is much smaller for vertices than for edges, the number of false 
peaks is still quite high, even for moderately complex images. For example, for an 
image with 200 vertices and a model with 20 vertices, 90% of the model vertices 
must be matched in order for the expected number of false matches to be low (in 
this case 8). If only half of the model vertices are accounted for, then nearly every 
fourth bucket (2236 out of 10,000) will have a cluster as large as that resulting from 
a correct match. 
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m = 20 
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5591 
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s = 100, m = 20 


11 


- 


- 


m = 10 


186 


9 


- 


m = 5 


1734 


405 
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Table 12. Expected number of peaks occurring at random for various numbers of sensory 
features, s, model features, m, and visible fractions of model features, / ("-" indicates a 
value of < 1). For vertex features, where b — 15, and with a Hough table of n — 10,000 
buckets. 



The more model features that are correctly matched to image features, the larger 
the resulting cluster of transformations. Thus, another means of quantifying the 
power of the generalized Hough technique is to consider what the minimum number 
of model features must be in order for there to be an expectation of less than one 
random peak of size / = fm in the Hough table. This value is shown in Table 13 
for the task just considered, of a 10,000 bucket Hough table, vertex features, and 
b = 15. The entry N.P. for s = 250 and / = .5 means that there is no possible model 
size such that the expected number of peaks of size .5m is less than 1 when there 
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/ = .5 


.75 


.9 


s = 250 


N.P. 


52 


30 


100 


30 


14 


9 



Table 13. Size model required to have an expectation of less than one random cluster at 
least as big as the correct match, for various numbers of sensory features, s, and visible 
fractions of model features, /. For vertex features, where b — 15, and with a Hough table 
of n = 10, 000 buckets. 

are 250 or more image vertices. Thus again we see the limitation of this clustering 
method for the recognition of moderately cluttered scenes. 

Next we consider the case of using all three parameters to perform the cluster- 
ing. For translation buckets of 5 pixels and rotation buckets of tt/36 radians, there 
are a total of n = 720,000 buckets. For edge features, the redundancy factor, b, is 
300 (with an error range of e p = 5 pixels and a fragmentation of /3 = .5, as shown in 
Table 1). The expected number of peaks occurring at random are shown in Table 
14, for peak sizes of I = .5m, I = .75m, and / = .9m. For a moderately cluttered 
image, with s = 500 edges, and a model with m = 10 edges, the expected number 
of false peaks is over 40,000 if only half of the model edges are matched to image 
edges. If 9 of the 10 model edges are matched, then there is still an expectation of 
229 false peaks. 
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m = 10 
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53 


- 
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Table 14. Expected number of peaks occurring at random for various numbers of sensory 
features, s, model features, m, and visible fractions of model features, / ("-" indicates a 
value of < 1). For edge features, where b - 300, and with a Hough table of n - 720, 000 
buckets. 



Table 15 shows the number of model features required in order for there to be an 
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expectation of less than one random peak of size / = fm in the Hough table. For a 
relatively cluttered image with 500 edges, and high occlusion of 50%, a model must 
have at least 80 features before the expected number of false peaks is less than 1. 
Even for a simple image with only 100 edges, with moderate occlusion of 25%, a 
model must have at least 10 edges for there to be an expectation of no false matches. 
Thus even using the full three parameters for clustering, there is a high likelihood 
that random clusters will be as large as those due to a correct match. 
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Table 15. Size model required to have an expectation of less than one random cluster at 
least as big as the correct match, for various numbers of sensory features, s, and visible 
fractions of model features, /. For edge features, where b = 300, and with a Hough table 
of n = 720, 000 buckets. 

Finally, we consider the case of using vertex features and the three-dimensional 
parameter Hough table with n = 720, 000 buckets. The relevant redundancy factor 
is b = 22 (with an error range of e p = 5 pixels, as shown in Table 3). Table 16 
shows the expected number of peaks of a given size that will occur at random, and 
Table 17 shows the size model necessary to limit the expected number of false peaks 
to less than one. In Table 17 it can be seen that for all but very complex images, 
a match of a model with 10 or fewer features will result in an expectation of less 
than one false peak in the Hough table. Thus the method works relatively well for 
this case. The cost is quite high, however, because there are about two orders of 
magnitude more buckets to be searched than there are distinct transformations from 
the model to the image. The number of transformations is ms, which is at most a 
few thousand, whereas there are 720,000 buckets. 



The Generalized Hough Method for 3D Recognition 

In this section, we use some empirical data on the number of transformations from 
a model to an image to evaluate the power of the generalized Hough transform in a 
three-dimensional recognition task. As with the above results based on the analytic 
formulation of the two-dimensional problem, we find that the likelihood of large 
peaks occurring at random is very high for even moderately complex images and 
levels of uncertainty. 

For 3D recognition, the size of a full Hough table becomes prohibitive, so only a 
subset of the transformation parameters are used to form the table. For example, in 
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/ = .5 
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s = 1000, m = 10 
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382 
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5 = 500, m = 5 
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51 
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s = 250, m = 5 
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s = 100, m = 5 


83 


- 


- 



Table 16. Expected number of peaks occurring at random for various numbers of sensory 
features, s, model features, m, and visible fractions of model features, / ("-" indicates a 
value of < 1). For vertex features, where b = 22, and with a Hough table of n = 720, 000 
buckets. 
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Table 17. Size model required to have an expectation of less than one random cluster at 
least as big as the correct match, for various numbers of sensory features, s, and visible 
fractions of model features, /. For vertex features, where b = 22, and with a Hough table 
of n = 720, 000 buckets. 



[Thompson and Mundy 87] the two parameters of rotation out of the viewing plane 
are used for an initial clustering. The Hough buckets are of size 2°, yielding a total 
of n = 32,400 buckets. An error range of 15° is allowed, so each transformation is 
entered into an average of 8 2 = 64 buckets. A model has about m = 5 features, an 
image has about i = 3000 features, and this results in about 20,000 transformations. 
Thus a total of about r = 1280000 transformations are entered into the table, 
yielding a A of about 40. In order for the expected number of false peaks in the 
table, E>i, to be less than one, the peak size, I, must be 68. This is an order of 
magnitude larger than the number of model features, m. Peaks of size at least m, 
which is 5, will occur at random with a probability of 99%. In other words, this 
initial clustering eliminates virtually none of the candidates. 

Following the initial clustering, a secondary clustering is performed using the 
third rotation parameter. This parameter is again quantized in 2° buckets, so there 
are a total of n = 5,832,000 buckets in the three-dimensional table. Each transfor- 
mation is now entered in 8 3 = 512 buckets in order to allow for 15° errors. Thus 
20,000 transformations yields r = 10, 240, 000 table entries, and A = 1.8. In order for 
E>i < 1, the peak size, /, must be at least 11, which is a factor of two larger than the 
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number of model features. Peaks of size 5 occur with a probability of about 1%, so 
there will be nearly a hundred thousand false peaks in the three-dimensional Hough 
table. Thus the remaining three transformation parameters still must perform a 
good deal of work to eliminate the false matches. Even with the full 6 parameters, 
false matches sometimes remain [Thompson and Mundy 87]. Finally, the amount 
of search required is very large, as about 10 million buckets must be considered in 
order to find the buckets with peaks. 

In order to get a more complete picture of the utility of the generalized Hough 
transform for transformation clustering, Table 18 shows how large the peak size, 
/, corresponding to a correct match must be in order to limit the probability of a 
random peak of at least that size, p>i. The values are shown for various levels of 
A = £■, the ratio of number of table entries to number of buckets are shown, and 
various probabilities, p>j. Recall that in order for the expected number of false 
matches to be less than 1, the probability should be less than -. Thus for a Hough 
table with 10,000 entries the corresponding column would be 10 -4 , and for a million 
entries it is the 10 -6 column. 
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Table 18. Peak size, /, for different values of A = £, and different probabilities, P> t , of 
peaks at least as large as / occurring at random. 



5. Summary 



We have formally considered several aspects of the generalized Hough transform as a 
method for recognizing objects from noisy data in complex cluttered environments. 
We have analyzed both the redundancy of the bucketing operation, and the like- 
lihood that random clusters of transformations will be as large as those resulting 
from a correct match. The major results of this analysis are as follows: 
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1. We have shown analytically that the range of transformations specified by a 
given pairing of model and image features, can be quite large. This is particu- 
larly true in the case of extended features, which can be partially occluded in 
the scene, and in the presence of significant amounts of sensor uncertainty. 

2. We have shown analytically, and through representative examples, that the 
number of Hough buckets specified by such a range of transformations can also 
be quite large. The fraction of the total number of buckets that are specified by 
a single data-model pairing increases with increasing sensor uncertainty, with 
a reduction in the total number of buckets (i.e. increasing coarseness of the 
Hough space), with increasing occlusion, when projections onto subspaces of 
the full parameter space are used, and when scale is allowed to vary. 

3. We have shown, using an occupancy model, that the number of model-image 
pairings likely to fall into the same Hough bucket at random, can be quite high. 
As a consequence the clusters that occur at random are often likely to be larger 
than those that correspond to a correct solution. This may force a recognition 
system to examine large portions of the Hough space, in order to verify a correct 
interpretation from a spurious collection of parameter vectors. This problem is 
exacerbated as the redundancy factor increases, and hence is affected by changes 
in sensor uncertainty, Hough tessellation and scene complexity, as above. 



Our conclusion is that while the generalized Hough transform technique is useful 
for some classes of recognition tasks, it does not scale well, and is poorly suited 
to recognition in complex environments. For example, our analysis suggests that 
the Hough transform should be adequate for the recognition of objects with limited 
occlusion and moderate sensor uncertainty, using isolated points such as vertices 
as the matching features. This is supported by the empirical evidence of several 
researchers in the field (e.g., [Silberberg et. al. 84] [Linainmaa et al. 85]). At the 
same time, however, the analysis suggests that the method will scale poorly, when 
applied to complex, cluttered scenes, or when using extended features such as edges 
(which are subject to partial occlusion). 

It may seem somewhat surprising that the expected performance of the gener- 
alized Hough transform is so poor for complex images. Recall, however, that the 
operation was originally used to separate outliers from good data. Its first use in 
recognition was for relatively simple tasks, where the data corresponding to the cor- 
rect solution is a fairly large fraction of all the data. In contrast, for recognition in 
complex scenes the good data is a small fraction of the incorrect data, or "outliers" . 
It just turns out that the method does not scale very well to tasks where the amount 
of correct data is relatively small compared to the amount of incorrect data. 
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Appendix 

Analysis of the basic case 

In this section, we fully derive the relationship denning the redundancy of the Hough 
transform for two dimensional edge segments. 




v 0,x 



Figure 5. Range of feasible translations, for fixed 9 and with no position error. The line 
in the direction of _RT, denotes the set of feasible translations for a given value of 0. 

Suppose we are considering the matching of a data edge with a model edge. Consider 
the situation shown in Figure 5. This shows the set of consistent translations, for 
a given value of 0, say 6 C , where we ignore for now the effect of error e p . That is, 
for a given rotation, equation (1) defines a set of translations, which are illustrated 
in the figure. Now, as 9 varies, this line will vary, in particular, it will rotate about 
the center defined by nij, with a radius of ||Mj||. We want to determine the union 
of the projection of each such line into the x-y plane. The situation is shown in 
Figure 6. 

To find the area of this region, we use the following simple trick. Consider the 
lower hashed region shown in Figure 7. If we translate and rotate this region to the 
upper hashed region shown in the figure, then we see that the area of the remaining 
region is simply given by 



rhe rSh 

Je=o Jp=s t 



pdpdO = 



Se 2 



To derive the limits St and Sh, we can use the parameters from the known edges. 
Consider Figure 8. Here, Mj denotes the size of the vector ||Mj||, and 4> is the angle 
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Figure 6. Rotation of the line of Figure 5 through hg radians. 




Figure 7. Total area of the swept region. 



made between Mj and Tj. The distance X is simply given by X = (Lj — £j)/2. 
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Figure 8. 

Using the law of cosines, we have 

S e 2 = MJ + X 2 - 2MjX cos(tt - <j>) 
S h 2 = Mj + X 2 - 2MjXcos(<j)). 

Thus, the entire area covered is 



-!)■ 



A = 2Mjh e X sinl <j> 

Note that by symmetry, we can assume that <j> 6 [7r/2,7r], as the other cases are 
similar. 

This analysis, however, ignored the effects of the sensing error. In particular, 
we know that the translation can be determined only to within a ball of radius e p . 
Thus, the full area is swept out by first sweeping this ball along the line of feasible 
translations, and then sweeping that entire region through the angle kg. This is 
equivalent to expanding the region swept out by rotating the line over h$ to include 
any point within a distance e p of the boundary of this region. The additional area 
is shown in Figure 9. 

The largest circular piece (denote (1) in the figure) has an area given by 



Je=o Jp= 



P dpd6 = h e (Sh + €p } ~ Sh 

Sh 2 



Similarly, the smaller circular piece (denote (2) in the figure) has area 

S e 2 -(St-e p f 
he . 

The two rectangular pieces (denoted (3) and (4) in the figure) have area 

4Xe p . 

Finally, the four joining segments have a total angular extent of 27T so that they 
contribute an area of 

Combining these areas with the original area, we find that the area covered by 
the entire region is 

A(h 9 , e p , Mj, Lj, £j, 4>) = 2Mjh e X sin [<j> - |) + ^\ + 4e p X + e p h e [S h + S e }. 
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Figure 9. Additional region of feasible translations due to sensing error. 



Now, we need to find a lower bound for the number of buckets that are in- 
tersected by such an area. The simplest lower bound, which is not a tight one, is 
given by assuming that the area is square, and can be tightly packed into the x-y 
portion of the Hough buckets. This may badly underestimate the number of buckets 
intersected by a volume, but it provides a convenient starting place. If the region is 
a tightly packed square, then the minimum number of buckets is given by 

' A{h 8 ,e p ,Mj,Lj,lj,<l>) ~ 
h] 

Now, this region corresponds to the number of buckets intersected, as the rota- 
tion component varies over the dimension of a single bucket. Thus, the redundancy 
factor for pose clustering, b, i.e. the number of buckets into which a single data- 
model pairing casts a vote is bounded below by 



6> 



2e a 
he 



A(he,e p ,Mj,Lj,£j,4>) 



hi 



where the bound on angular error is given by 



tan" 



2e % 



,\A 2 - 4 < 



and where the area is given by 

A{h6,e p ,Mj,Lj,lj,4)) = Mjh e (Lj-l s )8m (V - ^+xe t p +2e p (L J -£j)+€ p he[S h +Si]. 
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The measurements in which we are interested depend on the relationship be- 
tween dimensions of the object and the tesselation of the Hough space. We can 
simplify our expressions, by using relative measurements. In particular, we let 



IE r* = II 
ht h t 






M = — 
ht 



so that the redundancy of the Hough space is 

"2e 



b> 



\A*(h t ,c;,M},L*j,ri,<f>)\ 



where 



€ a = tan 



-l 



2e; 



,yV*) 2 -4(e;) : 

A*{h 9 ,e* p ,M},L*j,eiA) = M}h e (L*j - tysm (<£- |) + ir {erf 

+ 2e;(L5-r j ) + e;h 9 [sz + s;] 

S* h = \j{M*jf + \{L*j- if - M*j {L*j - 1$ cos <t> 

S*e = \j{M}? + \(L*j- if) 2 + M} (L} - t$ cos ^. 

This gives careful bounds on the redundancy factor. We can get more useful 
bounds by considering the following case. We will assume that angle <f> between Mj 
and Tj is uniformly distributed over the range 

This allows us to estimate the expected value of the first term for A*. Finding the 
expected value for Sh and St involves elliptic integrals of the second kind, so we 
underestimate the area by finding the minimum value for Sk + Se, as <j> varies over 
its range. A straightforward application of the calculus leads to: 

S* h + S* e > 1M*. 

We will also assume that Lj = L for all model edges, that Mj — M for all 
model edges, and that the data edges are of equal length, 

lj = /3L 
for some parameter f3, -f- < (3 < 1. 

Under these conditions, the expected area is at least 

hj>_ , _ /-.*\2 

■K 



A*(h e , e* p , M*, L m ,0)> 2M*(1 - /3)L* 'f + * {e* p Y + 2e* p L*(l - (3) + 2e* p h e M* (5a) 

and the expected redundancy is at least 

~2e a 



b> 



he 



\A*(he,e;,M*,L\(3)] 



where the bound on angular error is given by 



e a = tan 



-1 



2e! 



7 (*■)»- 4(«;)a 



(56) 



(5c) 
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The expressions in equation (5) give a lower bound on the expected number 
of buckets intersected by a data-model pairing. This lower bound is not tight, as 
in deriving it we have assumed that the area of consistency in the x-y plane can 
be tightly packed into the tesselations of the Hough space. A better lower bound 
on the expected number of buckets can be obtained by accounting for the fact that 
the area of consistency may only partially intersect buckets along its border. An 
example is shown in Figure 10, in which the swept region has an area that is roughly 
equivalent to 6 buckets in size, but which actually intersects 14 different buckets. 
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Figure 10. The number of buckets intersected may be larger than the ratio of the area of 
the region to the area of a bucket. 

A simple means of accounting for this effect is to observe that on average, a bucket 
on the border of the swept region will be only half occupied. As well, the perimeter 
of the swept region can be easily shown to be 

P = (S h + S e ) h e + 2 (Lj - £j) + 2we p 
which is bounded, by our earlier analysis, by 

P > 2Mjh e + 2(Lj- lj) + 2we p . 
The minimum number of buckets intersected by this perimeter is 

P 

V2V 
If we normalize with respect to the bucket size, we have 

P* > 2M*jh B + 2 {L} - £*) + 2we* p . 

Since, on average, border buckets are half occupied, in place of A*, we can now use 

P* 



A* + 



2\/2 
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If we let 

A* + (h e ,e;,M*,L*,/3) = A*+ P * 



2V2 
> 2M*(1 - p)L*'^- + 7T (e* p Y + 2e* p L*(l - /3) + 2 € ;^M* 



.he , / *\2 



7T 



+ J-(2M*h e + 2(l-/3)L* + 2we p ) (3a) 



then the expected redundancy is at least 

"2£fi 
he 

where the bound on angular error is given by 



b> 



\A* + ] (36) 



e a = tan" 1 I . 2fp I . (3c) 

7(*) a - 4(*;)a 



Analysis of the scaled case 

Similar to the case of rigid objects, we need to formally derive the redundancy of 
the Hough transform for objects that can freely scale. 

To determine the redundancy factor for parameter hashing in the case of scale, 
we again want to determine the number of buckets consistent with a data-model 
pairing for a single slice of the x-y components of the transform space. Note that 
in this case, the transform space T is four dimensional, with an extra axis for the 
scale factor. Projecting the volume obtained as 6 varies over the bounds of a single 
bucket gives us the volume shown in Figure 6, where now the borders are functions 
of the scale factor. If we now look at the projection of the volume as A; is varied, 
we will get the region obtained by varying the region in Figure 6 over the range of 
values of k. This new region is shown in Figure 3. We need to determine the area 
spanned by this region. The heavy lines in Figure 3 break the total area into three 
portions. The previous analysis implies that the large portion has an area 

2h e M' J (k h )X' ij (k h )sm(<f>-^) 

where 

M'j(k) = kMj 

and where k varies from k{ to kh and Mj is the midpoint distance of the model face 
without any scaling. 

The circular segment of the area in Figure 3 has an area given by 
Je=o J P =s t (k t ) 2 
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where 



Se(k) 2 = M'j(kf + Xljik) 2 + 2M' J {k)X' ij {k)cos<j> 
S h {k) 2 = M'j(k) 2 + X'^k) 2 - 2M' J (k)X' ij (k)cos<t>. 

To get the area of the slice of the triangular portion, we use the law of sines to derive 
[X'ijikrfM'jfa) - X^kWjiki)] sin^. 

Hence, the area of consistent translations, ignoring the error e p is given by 

A s =h $ k h Mj(k h Lj - tj) sin [<f> - -J 



+ ho(k h -ke) 



h+h Us + L -±- 



Mj 



h e — cos <j> \{k\ + k\ )Lj - (k h + k e )£j] 



+ Mj{k h - k e ) 



kh + ki tj 



sin^>. 



We must also account for error in measuring the position. As in the previous 
case, the addition in this case is found by expanding the area in Figure 3 by a 
distance e p , as shown in Figure 11. 




Figure 11. Region of translation space consistent with scale variation and angle variation. 



Using techniques similar to those employed in the case of no scaling, we find that 
the additional area, due to sensing error is given by 

hee P [Sh(k h ) + St(k e )] + e p [(k h + k ( )Lj - 2tj] 

+ e p [S h (k h ) - S h (ke) + S t (k h ) - Se(ke)] + *e 2 



Combining these two results yields an area of 



A s =hek h Mj(k h Lj - £j) sin \j> - -J 



-T~ \ Mj + —)- — 



+ h e {k h - kt) 

-h e ^- cos <t> \{k\ + kj)Lj - (k h + kt)£j] 



+ Mj(k h - k t ) 



kh + kg £/ 



sin<£ 



+ h e e p [S h (k h ) + S e (k t )] + € P [(k h + k £ )Lj - 21 j] 
+ e p [S h (k h ) - S h (h) + S e (k h ) - S e (k e )] + ire 2 p . 
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Similar to the case of no scale, we can bound this below by finding the minimum 
value taken on by the Sh and Si terms, yielding 



A a >h 6 khMj(k h Lj - f;)sin (<£ - |-J 



kh + kif Mj 2 + Lj 



(-'+¥) 



+ he(h -kt) 

- h 9 ^-cos<t> [(k 2 h + kj)Lj - (k h + ke)£j] 



Ljlj 



+ Mj(k h - kt) 



h + ke £j 

-^— Lj ~l 



+ hg€ r 



(k h + k e )Mj - ^-^-Lj 



sin<^ 
+ € p {{k h + k()Lj - 2£j] 



+ 2e p (k h - k e ) Mj + ne 2 



As before, we can take the expected value of this expression as <f> varies uniformly 
over the range [7r/2,7r]. This region corresponds to the number of buckets intersected 
as the rotation component varies over the range of a single bucket, and as the sacle 
factor varies over the range of a single bucket. Note that in this case, the area of 
the translation component of the Hough space that is consistent with an assignment 
is actually a function of the scale factor, rather than just a function of the size of 
the Hough buckets and the properties of the object and the sensing errors. We can 
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rewrite this equation in terms of the range of variation in scale, Ak: 



A s (Ak, k h ) =h e k h Mj {k h Lj - £j) - 



+ h e Ak 



(■>-¥)(*♦$)- 



Ljtj 



+ — Mj [{2k\ + 2k h Ak + Ak 2 )Lj - (2k h + Ak)£j] 



+ MjAk 




2 

7T 


+ he€ p 


Afc 
(2k h -Ak)Mj-—Lj 




+ 2e p A 


kA 


fj + *e 2 p . 





+ 



2e p (2 (k h -^pjLj- 2i^j 



As in the previous case, we can normalize the measurements relative to the 
dimensions of the Hough spacing, h t , so that the area is given by 

A* s (Ak,k h ) =h e k h M} {k h L*j - £*) I 

+A , A ,[(^_^)*( W )H^)-£is; 

+ ^-MJ \{1k\ + 2k h Ak + A* 2 )i; - (2*» + A*)f'] 
+ MJM[( ik -f) i; -|]i 



Ak 
(2k h -Ak)M*j-^L*j 



+ 



K (2 (** -^-)lj- u*) 



+ 2e;AkM5 + w(€* p y. 



Suppose we define the full range of possible scale factors to be [1, k max ], so that 
the model is defined as the smallest possible instance of an object. Then to count the 
redundancy factor in this case, we must sum the number of buckets obtained over 
all possible scale factors. If the spacing of the Hough buckets in the scale dimension 
is hk, then this sum is given by: 



2e a 



kyn,ax 



where 



A*(i s -ma,x{l,-jr^-},i s h k ) + J^ 
max(l, ^-) 



2^ 
he 



\A* S (h k , ih k )] 



Is = 



hk 



is the starting point for the scale summation, and where the first term in the ex- 
pression captures any partial inclusion of a bucket. 

We have assumed that k max is some integer multiple of hk. As before, the 
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bound on angular error is given by 



tan 



-l 



2e* 



Similar to the non-scaled case, we can obtain tighter bounds by considering the 
buckets on the edge of the region, which are likely to be only partially intersected 
by the region. The perimeter of Figure 9 can be shown to equal 

P. =he (Se(kt) - c p ) + k e Lj - tj + S h (k k ) - S h (k t ) 

he (S h (k h ) + e p ) + k h Lj - £j + S e (k h ) - Se(k t ) + 2xe p 

and this is bounded by 

Ps > h e ({k h + k e )Mj - kh ~ *% ) + (k h + k e )Lj - 2£j + 2(k h - k t )Mj + 2ire p . 

If we normalize with respect to bucket size, we get 

P* s > he ((k h + k e )M} - h^hiyj + ( kh + k e )L*j - 21) + 2{k h - k e )M*j + 2ire* p . 

Hence, a better bound on the expected redundancy is given by 



b 3 = 



2€a 

he 



A* + (i s — max{l, 



L*h, 



■},ishk) 



K max 



2£a 

hg 



\A; t+ (h k ,ih k )] 



where 



and where 



t s = 



max(l, ft 



A s ,+ — A s + 



p* 

2\/2* 
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